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1. INTRODUCTION 


The Simulated Aggregation Test (SAT): U.S. Corn and Soybean Exploratory 

Experiment was executed (1) to determine the labeling accuracy obtainable with 
the current corn and soybean labeling procedure and to determine the crop 
proportion-estimation errors of the resulting proportion estimates; (2) to 
compare the corn and soybean labeling procedure utilized in the SAT with that 
utilized in the Classification Procedures Verification Test (PVT) via a 
comparison of the labeling accuracy and the proportion-estimation errors of 
the two procedures; and (3) to test the aggregation logic for obtaining crop 
area and production estimates at state and regional levels. This report 
presents the results of (I) and (2). 

The design of the SAT called for three analyst-interpreter (AI) groups (two 
from NASA and one from Lockheed) to label 50 to 70 Type I dots on each of 88 
segments located in 5 agro-physical units (APU's) in 6 states of the U.S. 

Corn Belt. Each segment was to be labeled once only using a modified ver- 
sion of the corn and soybean labeling procedure utilized in the PVT (refs. 1 
and 2). 

Of the 88 segments labeled, 23 were a subset of the 29 blind sites processed 
in the PVT; 35 were additional blind sites; and the remaining 30 were nonblind 
sites. All the 23 segments in the SAT that were also processed in the PVT 
(hereafter referred to as Group 1 segments) had digitized ground truth 
available. Of the additional 35 blind sites (hereafter referred to as Group 2 
segments), 18 had digitized ground truth available, and the remaining 17 had 
400-dot ground truth available. 

Since the NASA groups had already seen the ground truth for the Group 1 seg- 
ments, it was stipulated that these 23 segments would be processed by the 
Lockheed group. Otherwise, there were no constraints on the assignment of 
segments to the AI groups. Table 1-1 shows the assignment of the blind sites 
to the APU's and AI groups. 
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2. ANALYSIS OF THE SIMULATED AGGREGATION TEST 


Analyses were made to investigate the crop proportion-estimation accuracy and 
dot-labeling accuracy in the SAT as well as to compare the crop proportion- 
estimation accuracy and dot-labeling accuracy of the SAT with that of the PVT. 


2.1 CROP PROPORTION-ESTIMATION ACCURACY IN THE SIMULATED AGGREGATION TEST 


Initially, a linear model of the form 


p ijk - p ijk s u + A i + G j + (AG) ij + £ (u)k 


was assumed where 


i jk 

p ijk 

u 

Ai 

Gi 


= the proportion estimate of the crop of interest for the k t * 1 segment 
of the i^ APU, labeled by the group 

= the corresponding ground truth proportion 

the overall mean difference 

= the effect of the i^* 1 APU (fixed) 

= the effect of the group (random) 


(AG)-jj = the interaction of the i^ APU and the group (mixed) 

£ (ij)k = t * ie rian< ^ om error resulting from the k 1 -* 1 segment of the i^ 
APU, labeled by the group, assumed NID(0,o^). 


However, for the crops of interest (corn and soybeans), the model accounted 
for less than 29 percent of the observed variation. (Table 2-1 gives the 
coefficient of determination, R^, for each crop.) Hence, the analyses were 
performed without regard to APU or group effects. 


Plots of ground truth proportions (abscissa) versus crop proportion-estimation 
error (ordinate) are displayed in figures 2-l(a) for corn and 2-1 ( b ) for soy- 
beans. Overestimation of corn and underestimation of soybeans are clearly 
evident, a pattern that also emerged in the PVT (ref. 3). 
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(a) Corn. 



. (b) Soybeans. 

Figure 2-1.- Crop proportion-estimation accuracy for the SAT. 
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absolute value of the proportion-estimation error (absolute error) of each 
Group 1 segment with -the mean absolute error of the corresponding PVT segment 
by means of the difference: mean absolute error minus absolute error. 

The hypothesis of a mean difference of zero versus all alternatives was then 
tested (a = 0.05). The results, displayed in table 2-4, show no significant 
difference in the proportion-estimation accuracy of corn; however, soybeans 
were underestimated to a significantly greater degree in the Group 1 segments 
(a mean difference of -2.60 percent). 

2.2.2 COMPARISON OF THE GROUP 2 SEGMENTS WITH THE CLASSIFICATION PROCEDURES 
VERIFICATION TEST 

The analysis for the comparison of the Group 2 proportion-estimation accuracy 
with the PVT proportion-estimation accuracy consisted of testing the hypoth- 
esis that the mean error of the PVT segments minus the mean error of the 
Group 2 segments was significantly different from zero (a = 0.05) versus all 
alternatives. Table 2-5 displays the results of this test. Corn was over- 
estimated to a significantly greater degree and soybeans underestimated to a 
significantly greater degree in the Group 2 segments. 


2.3 LABELING ACCURACY OF THE SIMULATED AGGREGATION TEST 

Tables 2-6(a) through 2-6(c) display, for all blind sites for the Group 1 
segments and all blind sites for the Group 2 segments, the percentage of a 
given crop category labeled "corn," "soybeans," and "other" (neither corn nor 
soybeans). With errors of omission being essentially equal for corn and soy- 
beans, the confusion errors for Group 1 and Group 2 together [table 2-6 ( a ) ] 
indicate that the AI groups could recognize corn signatures more readily than 
soybean signatures. This failure to discriminate soybeans from corn is due to 
late planting of soybeans, making the signatures of these late planted soy- 
beans spectrally inseparable from corn. As a result, corn is overestimated 
and soybeans underestimated. 
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TABLE 2-6.- DISTRIBUTION OF LABELS WITHIN EACH 
GROUND TRUTH CATEGORY 


(a) All SAT blind sites 


Ground 

truth 

Label 

Ground 

truth 

proportion, 

percent 

Corn, 

percent 

Soybeans, 

percent 

Other, 

percent 

Corn 

92.58 

1.62 

5.80 

43.36 

'Soybeans 

6.87 

87.58 

5.54 

30.25 

Other 

2.92 

1.14 

95.93 

26.39 


(b) 

Group 1 blind sites 


Ground 

truth 

Label 

Ground 

truth 

proportion, 

percent 

Corn, 

percent 

Soybeans, 

percent 

Other, 

percent 

Corn 

88.25 

1.77 

9.98 

44.00 

Soybeans 

7.97 

83.33 

8.70 

26.93 

Other 

3.69 

2.35 

93.96 

29.07 


(c) 

Group 2 blind sites 


Ground 

truth 

Label 

Ground 

truth 

proportion, 

percent 


Soybeans , 
percent 

Other, 

percent 

Corn 

94.89 

1.54 

3.56 

43.03 

Soybeans 

6.39 

89.46 

4.15 

31.99 

Other 

2.45 

0.41 

97.14 

24.99 
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reducing the underestimation of soybeans, indicating that committing soybeans 
with corn has a greater impact on soybean proportion-estimation accuracy than 
the mislabeling of soybeans as "other." 

2.4 COMPARISON OF THE DOT-LABELING ACCURACY OF THE SIMULATED AGGREGATION TEST 
AND THE CLASSIFICATION PROCEDURES VERIFICATION TEST 

Dot-labeling accuracy for the PVT, the Group 1 segments, the Group 2 segments, 
and the Group 1 and Group 2 segments combined is displayed in table 2-7. 
Overall, the labeling accuracy of the SAT improved over that of the PVT, with 
the labeling accuracy of the Group 2 segments contributing the most to this 
improvement. However, since dot-labeling accuracy data at the segment level 
was available only for the Group 1 segments, it was not possible to determine 
if the improvement in labeling accuracy for the Group 2 segments was 
significant. 

The labeling accuracy of each Group I segment was compared with the mean 
labeling accuracy of the corresponding PVT segment by subtracting the Group 1 
figures from the corresponding PVT figures. The null hypothesis of a mean 
difference of zero was tested against all alternatives (a = 0.05). The 
results are given in table 2-8. 

Since each of the 95 percent confidence intervals contains zero, the null 
hypothesis that the mean difference in labeling accuracy between the PVT seg- 
ments and the SAT Group 1 segments is zero could not be rejected. 

2.5 ANALYST-INTERPRETER LABELED, TYPE I DOT PROPORTION ESTIMATES 

Crop proportion estimates of corn and soybeans were made for each blind site 
by using the proportion of dots labeled corn and the proportion of dots 
labeled soybeans. Figures 2-2(a) for corn and 2-2 ( b ) for soybeans display 
plots of ground truth proportions versus the dot proportion-estimation error. 

In table 2-9, the mean errors of the machine-classified estimates and the dot 
estimates are displayed. For both corn and soybeans, the Type 1 dots, as a 
random sample, produced smaller estimation errors, with the dot-estimation 
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TABLE 2-9.- CLASSIFICATION ERRORS OF THE SAT 
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error for corn not significantly different from zero, although the estimate of 
soybeans is biased. However, the mean square errors for the two types of 
classification are not appreciably different, indicating that if the dot esti- 
mates are not better than the machine-classified estimates, then certainly 
they are no worse. 

To compare the types of classification, two procedures were used. The first 
procedure, utilizing the binomial test, was to investigate whether or not one 
type of classification tended to yield superior estimation accuracy over the 
other. The first step in this procedure wa s determining the proportion of 
segments for which the dot estimates produced smaller, absolute deviations 
from ground truth. (.See "Improved.," table 2-10.) Then the null hypothesis 
that this proportion was not significantly different from 50 percent 
(a = 0.05) was tested. For both corn and soybeans, the null hypothesis was 
not rejected. In other words, machine classification is no more likely to 
yield accurate estimates than a random sample of Type 1 dots. 

To further qualify the comparison, the mean improvement of machine-classified 
estimates over dot estimates (see table 2-10) was obtained by finding the 
mean, on a segment-by-segment basis, of the absolute deviation from ground 
truth of the machine-classified estimate minus the absolute deviation from 
ground truth of the dot estimate. The null hypothesis of no significant 
improvement (a = 0.05) was tested. The null hypothesis could not be rejected. 

Thus, machine classification does not improve upon a random sample of Type 1, 
analyst-labeled dots whether measured as a reduction of mean square error, a 
likelihood of yielding more accurate estimates, or a mean difference in 
estimation accuracy. 
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3. SUMMARY OF RESULTS 


The following results emerged from the evaluation of the SAT: 

1. Corn was significantly overestimated on an average of 4.58 percent per 
segment (standard deviation, 6.95 percent), and soybeans were signifi- 
cantly underestimated on an average of 7.81 percent per segment [standard 
deviation, 5.57 percent (table 2-2)]. 

2. When comparing the proportion-estimation accuracy of the Group 1 SAT seg- 
ments with the PVT segments, no significant difference emerged for corn; 
however, soybeans were underestimated to a significantly greater degree in 
the SAT segments (table 2-4). 

3. When comparing the proportion-estimation accuracy of the Group 2 SAT seg- 
ments with the PVT segments, corn was overestimated to a significantly 
greater degree and soybeans underestimated to a significantly greater 
degree in the SAT segments (table 2-5). 

4. The labeling accuracy of the Group 2 segments was higher than that of the 
Group 1 segments as a result of fewer corn and soybean dots being mis- 
labeled as "other" in the Group 2 segments [tables 2-6(b) and 2-6 ( c ) ] . 

5. In the SAT, more soybeans were labeled corn than corn, soybeans. This was 
caused by the spectral inseparability of late planted soybeans from corn 
[tables 2-6(a) through 2-6(c)]. 

6. The spectral inseparability of late planted soybeans from corn resulted in 
the overestimation of corn and underestimation of soybeans. 

7. Since fewer corn and soybean dots were mislabeled "other" in the Group 2 
segments (as compared with the Group 1 segments), the estimation of corn 
was further inflated, although the reduction in mislabeling had little 
effect on the soybean proportion estimates [tables 2-6 ( b) and 2-6 ( c) ] . 

8. Overall, labeling accuracy in the SAT improved over that in the PVT. How- 
ever, there was no significant difference in labeling accuracy between the 
PVT and Group 1 segments (tables 2-7 and 2-8). 
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4. RECOMMENDATIONS 


An alternate machine classification technique should be developed since the 
procedure used in this experiment did not improve upon a random sample of 
analyst-labeled. Type 1 dots. Methods should also be developed to compensate 
for the adverse effect that late planted soybeans have upon corn and soybean 
proportion-estimation accuracy. 
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